NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Bounding the interleaving distance for mapper graphs with a loss function

https://doi.org/10.1007/s41468-025-00215-x

Chambers, Erin Wolf; Munch, Elizabeth; Percival, Sarah; Wang, Bei (September 2025, Journal of Applied and Computational Topology)

Full Text Available
Efficient Computation of a Semi-Algebraic Basis of the First Homology Group of a Semi-Algebraic Set

https://doi.org/10.1007/s00454-024-00626-0

Basu, Saugata; Percival, Sarah (September 2024, Discrete & Computational Geometry)

Let $$\R$$ be a real closed field and $$\C$$ the algebraic closure of $$\R$$. We give an algorithm for computing a semi-algebraic basis for the first homology group, $$\HH_1(S,\mathbb{F})$$, with coefficients in a field $$\FF$$, of any given semi-algebraic set $$S \subset \R^k$$ defined by a closed formula. The complexity of the algorithm is bounded singly exponentially. More precisely, if the given quantifier-free formula involves $$s$$ polynomials whose degrees are bounded by $$d$$, the complexity of the algorithm is bounded by $$(s d)^{k^{O(1)}}$$. This algorithm generalizes well known algorithms having singly exponential complexity for computing a semi-algebraic basis of the zero-th homology group of semi-algebraic sets, which is equivalent to the problem of computing a set of points meeting every semi-algebraically connected component of the given semi-algebraic set at a unique point. It is not known how to compute such a basis for the higher homology groups with singly exponential complexity. As an intermediate step in our algorithm we construct a semi-algebraic subset $$\Gamma$$ of the given semi-algebraic set $$S$$, such that $$\HH_q(S,\Gamma) = 0$$ for $q=0,1$. We relate this construction to a basic theorem in complex algebraic geometry stating that for any affine variety $$X$$ of dimension $$n$$, there exists Zariski closed subsets \[ Z^{(n-1)} \supset \cdots \supset Z^{(1)} \supset Z^{(0)} \] with $$\dim_\C Z^{(i)} \leq i$, and $$\HH_q(X,Z^{(i)}) = 0$$ for $$0 \leq q \leq i$$. We conjecture a quantitative version of this result in the semi-algebraic category, with $$X$$ and $$Z^{(i)}$$ replaced by closed semi-algebraic sets. We make initial progress on this conjecture by proving the existence of $$Z^{(0)}$$ and $$Z^{(1)}$$ with complexity bounded singly exponentially (previously, such an algorithm was known only for constructing $$Z^{(0)}$$).
more » « less
Full Text Available
Topological data analysis reveals core heteroblastic and ontogenetic programs embedded in leaves of grapevine (Vitaceae) and maracuyá (Passifloraceae)

https://doi.org/10.1371/journal.pcbi.1011845

Percival, Sarah; Onyenedum, Joyce G; Chitwood, Daniel H; Husbands, Aman Y (February 2024, PLOS Computational Biology)
Bollenbach, Tobias (Ed.)
Leaves are often described in language that evokes a single shape. However, embedded in that descriptor is a multitude of latent shapes arising from evolutionary, developmental, environmental, and other effects. These confounded effects manifest at distinct developmental time points and evolve at different tempos. Here, revisiting datasets comprised of thousands of leaves of vining grapevine (Vitaceae) and maracuyá (Passifloraceae) species, we apply a technique from the mathematical field of topological data analysis to comparatively visualize the structure of heteroblastic and ontogenetic effects on leaf shape in each group. Consistent with a morphologically closer relationship, members of the grapevine dataset possess strong core heteroblasty and ontogenetic programs with little deviation between species. Remarkably, we found that most members of the maracuyá family also share core heteroblasty and ontogenetic programs despite dramatic species-to-species leaf shape differences. This conservation was not initially detected using traditional analyses such as principal component analysis or linear discriminant analysis. We also identify two morphotypes of maracuyá that deviate from the core structure, suggesting the evolution of new developmental properties in this phylogenetically distinct sub-group. Our findings illustrate how topological data analysis can be used to disentangle previously confounded developmental and evolutionary effects to visualize latent shapes and hidden relationships, even ones embedded in complex, high-dimensional datasets.
more » « less
Full Text Available
Drawing Reeb Graphs

Chambers, E; Fasy, Brittany T; Sereshgi, Erfan H; Löffler, Maarten; Percival, Sarah (September 2023, 31st International Symposium on Graph Drawing and Network Visualization, Revised Selected Papers, Part II)
Bekos, Michael A; Chimani, Markus (Ed.)
Full Text Available
A critical analysis of plant science literature reveals ongoing inequities

https://doi.org/10.1073/pnas.2217564120

Marks, Rose A.; Amézquita, Erik J.; Percival, Sarah; Rougon-Cardoso, Alejandra; Chibici-Revneanu, Claudia; Tebele, Shandry M.; Farrant, Jill M.; Chitwood, Daniel H.; VanBuren, Robert (March 2023, Proceedings of the National Academy of Sciences)

The field of plant science has grown dramatically in the past two decades, but global disparities and systemic inequalities persist. Here, we analyzed ~300,000 papers published over the past two decades to quantify disparities across nations, genders, and taxonomy in the plant science literature. Our analyses reveal striking geographical biases—affluent nations dominate the publishing landscape and vast areas of the globe have virtually no footprint in the literature. Authors in Northern America are cited nearly twice as many times as authors based in Sub-Saharan Africa and Latin America, despite publishing in journals with similar impact factors. Gender imbalances are similarly stark and show remarkably little improvement over time. Some of the most affluent nations have extremely male biased publication records, despite supposed improvements in gender equality. In addition, we find that most studies focus on economically important crop and model species, and a wealth of biodiversity is underrepresented in the literature. Taken together, our analyses reveal a problematic system of publication, with persistent imbalances that poorly capture the global wealth of scientific knowledge and biological diversity. We conclude by highlighting disparities that can be addressed immediately and offer suggestions for long-term solutions to improve equity in the plant sciences.
more » « less
Full Text Available
Expression‐based machine learning models for predicting plant tissue identity

https://doi.org/10.1002/aps3.11621

Palande, Sourabh; Arsenault, Jeremy; Basurto‐Lozada, Patricia; Bleich, Andrew; Brown, Brianna_N I; Buysse, Sophia F; Connors, Noelle A; Das_Adhikari, Sikta; Dobson, Kara C; Guerra‐Castillo, Francisco Xavier; et al (January 2025, Applications in Plant Sciences)

Abstract PremiseThe selection ofArabidopsisas a model organism played a pivotal role in advancing genomic science. The competing frameworks to select an agricultural‐ or ecological‐based model species were rejected, in favor of building knowledge in a species that would facilitate genome‐enabled research. MethodsHere, we examine the ability of models based onArabidopsisgene expression data to predict tissue identity in other flowering plants. Comparing different machine learning algorithms, models trained and tested onArabidopsisdata achieved near perfect precision and recall values, whereas when tissue identity is predicted across the flowering plants using models trained onArabidopsisdata, precision values range from 0.69 to 0.74 and recall from 0.54 to 0.64. ResultsThe identity of belowground tissue can be predicted more accurately than other tissue types, and the ability to predict tissue identity is not correlated with phylogenetic distance fromArabidopsis.k‐nearest neighbors is the most successful algorithm, suggesting that gene expression signatures, rather than marker genes, are more valuable to create models for tissue and cell type prediction in plants. DiscussionOur data‐driven results highlight that the assertion that knowledge fromArabidopsisis translatable to other plants is not always true. Considering the current landscape of abundant sequencing data, we should reevaluate the scientific emphasis onArabidopsisand prioritize plant diversity.
more » « less
Full Text Available
Topological data analysis reveals a core gene expression backbone that defines form and function across flowering plants

https://doi.org/10.1371/journal.pbio.3002397

Palande, Sourabh; Kaste, Joshua_A M; Roberts, Miles D; Segura_Abá, Kenia; Claucherty, Carly; Dacon, Jamell; Doko, Rei; Jayakody, Thilani B; Jeffery, Hannah R; Kelly, Nathan; et al (December 2023, PLOS Biology)
Drost, Hajk-Georg (Ed.)
Since they emerged approximately 125 million years ago, flowering plants have evolved to dominate the terrestrial landscape and survive in the most inhospitable environments on earth. At their core, these adaptations have been shaped by changes in numerous, interconnected pathways and genes that collectively give rise to emergent biological phenomena. Linking gene expression to morphological outcomes remains a grand challenge in biology, and new approaches are needed to begin to address this gap. Here, we implemented topological data analysis (TDA) to summarize the high dimensionality and noisiness of gene expression data using lens functions that delineate plant tissue and stress responses. Using this framework, we created a topological representation of the shape of gene expression across plant evolution, development, and environment for the phylogenetically diverse flowering plants. The TDA-based Mapper graphs form a well-defined gradient of tissues from leaves to seeds, or from healthy to stressed samples, depending on the lens function. This suggests that there are distinct and conserved expression patterns across angiosperms that delineate different tissue types or responses to biotic and abiotic stresses. Genes that correlate with the tissue lens function are enriched in central processes such as photosynthetic, growth and development, housekeeping, or stress responses. Together, our results highlight the power of TDA for analyzing complex biological data and reveal a core expression backbone that defines plant form and function.
more » « less
Full Text Available

Search for: All records